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Abstract 

Random linear network codes can be designed and implemented in a distributed manner, with low computational 
complexity. However, these codes are classically implemented |Q] over finite fields whose size depends on some 
global network parameters (size of the network, the number of sinks) that may not be known prior to code design. 
Also, if new nodes join the entire network code may have to be redesigned. 

In this work, we present the first universal and robust distributed linear network coding schemes. Our schemes 
are universal since they are independent of all network parameters. They are robust since if nodes join or leave, 
the remaining nodes do not need to change their coding operations and the receivers can still decode. They are 
distributed since nodes need only have topological information about the part of the network upstream of them, 
which can be naturally streamed as part of the communication protocol. 

We present both probabilistic and deterministic schemes that are all asymptotically rate-optimal in the coding 
block-length, and have guarantees of correctness. Our probabilistic designs are computationally efficient, with 
order-optimal complexity. Our deterministic designs guarantee zero error decoding, albeit via codes with high 
computational complexity in general. Our coding schemes are based on network codes over "scalable fields". 
Instead of choosing coding coefficients from one field at every node as in (TJ, each node uses linear coding 
operations over an "effective field-size" that depends on the node's distance from the source node. The analysis of 
our schemes requires technical tools that may be of independent interest. In particular, we generalize the Schwartz- 
Zippel lemma [2] by proving a non-uniform version, wherein variables are chosen from sets of possibly different 
sizes. We also provide a novel robust distributed algorithm to assign unique IDs to network nodes. 

I. Introduction 

The paradigm of network coding allows each node in a network to process information in a non-trivial 
manner. As shown in J3), flU, [[5]|, even if intermediate nodes simply perform linear operations over 
some finite field, the resulting network codes can be information-theoretically rate-optimal for a large 
class of communication problems. In particular algorithms that design codes for multicast communication 
problems, wherein each of multiple sinks requires the same information from a source node, have been 
well-studied. The design algorithms in J6]|, are deterministic and centralized, and result in network 
codes with zero-error. In contrast the algorithms in flU, (El are decentralized and probabilistic, and for any 
e > result in network codes that "fail" with probability at most e. Both these types of design algorithms 
(and the resulting network codes) are computationally tractable. 

However, for all current network code design algorithms, some information network parameters is 
necessary prior to the code design, to determine the size of the finite field over which linear network 
coding is performed. In particular, the centralized algorithms in J6]|, require prior knowledge of the 
entire network, and even the decentralized algorithms in flTJl, flHJ require knowledge of the network size and 
the number of sinks - if these parameters are unavailable, code design cannot proceed with any guarantees 
of correctness, hence prior designs are not universal. Also, in the case of dynamically changing network 
topologies, if even one new node joins the network the entire network code may need to be updated due 
to a change in the field-size required, hence such codes are also not robust. 

In this work we develop the first universal and robust distributed linear codes that are independent 
of all network parameters, and are designed to satisfy a pre-specified tolerance on the error-probability 
(defined as the probability that the linear transform from the source to some sink is not invertible). The 
essential idea behind our design is that of using "scalable fields'^ Linear coding operations are chosen 

'Scalable fields do not mean embeddings of, for example, F2 into F 2 2, since arithmetic operations are defined differently over different 
fields, and hence the overall transform would not be linear. 
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from nested finite subsets of an appropriate infinite field - in particular we choose ¥ 2 (z) the field of 
rational functions over F 2 , i.e., the field whose elements are ratios of binary polynomials. Operations over 
this field can be implemented via binary filters (or equivalently, convolutional codes) at each node. For 
instance, a node that chooses to implment the operation 1 + z 2 on an incoming binary sequence x(n) to 
generate an outgoing binary sequence y(n) would set y(n) = x(n) + x(n — 2). Convolutional network 
coding as a model of linear network coding has been well-studied (see for example 

As information percolates down the network, each node makes its own estimate of the "effective field 
size", i.e., the size of the subset of ¥ 2 (z) from which that node should choose its coding operations, so as 
to meet the guarantee on the pre- specified tolerance on the overall error-probability. Our codes are able 
to perform this book-keeping despite having access only to information that can be percolated down the 
network at rates that are asymptotically negligible in the block-length - like standard distributed network 
codes, our codes are also asymptotically rate-optimal. 

Our results are as follows. In Section [Hi] we prove a generalization of the Schwartz-Zippel lemma [2] 



that is useful as a technical tool in some of our code constructions; it may also be of independent interest 
for other universal algorithms. 



In Section IV we present probabilistic universal and robust codes. That is, given any e > and 
any network, we present codes that guarantee that the linear transform from the source to each sink is 
invertible with probability at least 1 — e - hence our codes are universal. Further, even if nodes join or 
leave, pre-existing nodes do not need to change their coding operations to preserve the same guarantee 
of correctness - hence our codes are robust. We present two such codes. The first code is independent of 
network size, but does depend on the number of sinks. We present it primarily for expository purposes, 
since its presentation is simpler than that of our second set of codes, which are independent of all 
network parameters, including the number of sinks. Both these sets of codes base their choices of coding 
operations based on their distance from the source node. While the effective field-size over which our codes 
operate, and hence the computational complexity of our codes, are larger than those of prior distributed 
designs HI, (H, the complexity of implementing them is still polynomial in network parameters. Also, we 
present in Theorem [3] a class of networks that demonstrates that our codes have essentially order-optimal 
computational complexity for universal codes. 

In Section |V] we consider deterministic universal and robust codes. As a technical tool we first discuss 
a decentralized algorithm to distribute unique IDs to each node in a robust manner - even if a new node 
joins we guarantee that it too can be given an ID that is distinct from all others in the network. Building 
on this tool, and a novel use of Cantor's classical mapping between Z and Z n for any finite n, we design 
zero-error decentralized codes that are independent of all network parameters, and robust to changes in 
network topology. We provide two constructions. Our first construction, also primarily expository, is just 
for codes of rate 2, and is computationally efficient to design and implement. Our second construction is for 
arbitrary rate codes. This generalization comes at the cost of exponentially increasing the implementation 
complexity, compared to our other constructions^} 

We note that all our algorithms provide guarantees of correctness as long as the source transmits 
information at a rate no greater than can be supported by the network, i.e., its min-cut. We view the 
process of determining this rate as a rate-control issue - our code designs are independent of the size of 
the min-cut. 

2 We distinguish between the computational complexity of design and that of implementation. The former refers to the the computational 
cost of designing the coding operations at each node, and is a one-time cost. The latter corresponds to the computational cost incurred by 
each node as it implements the pre-designed coding operations, and is a repeated cost for each packet transmitted by that node. All our codes 
have design complexity that is at most polynomial in the network parameters. Further, most of our designed codes codes have implementation 
complexity that is also polynomial in network parameters; the only exception is the last of the proposed designs corresponding to the general 
design for zero-error universal and robust distributed linear codes. 
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A. Related work 

The distributed random linear codes of flU, flU require field-sizes to scale roughly as |£||T|. As shown 
in IfTOl , even with centralized design of network codes, the field size over which coding must be performed 
as at least |T|. 

As to universal codes (codes independent of some problem parameters), they have been well- studied 
in the classical information-theory setting (for instance in source coding [fTTII and channel coding Ifl2l ). 

In the network coding setting, however, the literature is much sparser. The work of [5J proposes "robust 
network codes" that are resilient to network failure patterns. However, the field-size over which coding 
is performed depends on the number of failure patterns, and hence these codes are not truly universal. 
Further, the computational complexity of designing such codes is prohibitive. There is also significant 
work on network coding for packet erasure networks (for instance lfl"3l0 . Our codes can tolerate all such 
errors. 

The work of 031 examines "decentralized network coding" in which new nodes can join a network 
without disrupting pre-designed coding operations. Here, too, the field-size choice for the initial design 
depends on the size of the network. Further, the code designs are for special cases - either for rate-2 



codes (analogous to the codes we present in Section V-C) or for networks with only two sink nodes. 



A preliminary version of the work in Section [V] was previously in the thesis 1031 , and presented (but 
not published) in ifToll . 



II. Notation and definitions 

A. Network Model 

In this paper, we adopt the single-source multicast network model of [5J. Let the network be represented 
by a directed acyclic^] graph Q = (V, £). Here V represents the set of nodes and £ the set of edges. The 
graph has a pre-specified source node s and |T| sink nodes T = {ti, t 2 , ■ ■ ■ , t\r\}- A directed edge e from 
node u to node v is said to have tail u (denoted tail(e)) and head v (denoted head(e)). The link e is 
then said to be an incident outgoing link of u and an incident incoming link of v. 



B. Communication Model 

The communication goal is for the source to communicate identical information to each sink. 

As is standard O, we assume that each link carries one packet of information per time-step. This is 
reasonable since if some link's capacity is less we may consider the link's communication to be over 
multiple successive time-steps, and if the link's capacity is greater we can subdivide it into multiple links. 
The packet-length in bits is denoted by n. 

The network capacity, denoted by C, is the time-average of the maximum number of packets that can 
be delivered from the source s to each sink t £ T simultaneously. It can be also expressed as the minimum 
of the min- cut from the source s to each sink t. The rate R is the average number of information packets 
that the source s generates per time-step, to be delivered to each sink t over the network Q. Without loss 
of generality we assume that R < C. Lastly, let c denote the maximum capacity of any single link in the 
network. 



C. Code Model 

1 ) Network code: The network code comprises of the encoders at the source and each node inside the 
network, and the decoders at each of the sinks. In particular we focus on linear network codes, i.e., codes 
where the source node, each internal node, and each sink performs linear combinations of information in 
packets on incident incoming links to generate packets on incident outgoing links. Specifically we consider 
the class of convolutional linear operations, well-studied in classical coding theory, that we reprise below. 
The base-field for arithmetic is chosen to be F 2 , hence all operations described below are binary. 

3 The work can be directly extended to multi-source multicast networks, and over networks that may contain cycles, as long as each source 
has a unique identifier. To ease notational and description complexity we omit details here 
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2) Convolutional network code: Recall that the z-transform [fTTll of any sequence {a(z)}" =1 of bits 
is given by the polynomial Y17=i a (^) z ^ denoted A(z). Further, recall that the output of the convolution 
operation a*b between two sequences^] {a(i)}™ =1 and i is defined as the length-n + n' sequence 

whose ith term equals Y^=i a (j)K n ' ~ J ' + Lastly, it is well-known that the z-transform of a * b equals 

A(z)B(z). 

Convolutional codes [fT71 have long been used in point-to-point communication scenarios. The idea of 
using convolutional codes for network coding (in networks with cycles) was foreshadowed in [3], and 
made explicit in [[5]| (who also noted that such an algebraic model for coding operations can help kill two 
birds with one stone, i.e., it can also help model delays in networks). The work of Erez et al. (see for 
example [9J) gave the first efficient designs for convolutional network codes, i.e., codes over F 2 (z). In 
our work, F 2 (z) affords the advantage that it allows for coding operations to be chosen from a potentially 
unbounded set, which helps us circumvent the difficulty that we do not know the network's parameters 
in advance. 

The source's packets are denoted by X\, X 2 , . . . , Xr - each is a length-n bit-vector. The corresponding 
^-transforms are denoted X\(z), X 2 (z), . . . , Xr(z). Collectively they are represented by the length-i? 
vector of polynomials X(z). Each edge e carries the packet Y e , and its ^-transform is denoted Y e (z). 
Lastly, the z-transform of the packets on incident incoming links to any sink t are denoted by the vector 
Y t (z). We henceforth refer to a sequence and its z-transform interchangeably. 

Let u, v and w be three nodes such that there is at least one edge from u to v and at least one edge 
from v to w. We use a 5-tuple to denote a coding choice for such nodes - specifically, P u ,i,v,j,w{ z ) refers 
to the local coding coefficient of the convolutional operation on the information on the ith edge from u 
to v to the jth edge from v to w. The choices of values of the local coding coefficients P u ,i,v,j,w{z) are 
code design parameters whose specifications are the primary objective of subsequent sections. Let e be 
a specific edge from v to w, and e' denote a dummy variable that ranges over all edges incoming to v 
(and hence is indexed by the pair (u,i)). Thus the convolutional operation that is performed at node v 
comprises of taking linear combinations of the information Y e < with the appropriate fi u ,i,v,j,w{ z ) over all 
edges e' incident incoming to node v , to generate the information Y e on the edge e incident outgoing from 
v. To simplify notation, we henceforth write f3 u ,i,v,j,w{z) simply as /3 ee /(z) with the understanding that 
(e, e') index the appropriate 5-tuple. Thus the linear transform at each node can be written symbolically 
as 

e' 

Since all the linear operations performed by the network can be represented via operations over polynomi- 
als over the binary field, we henceforth consider all arithmetic to be over the field of rational functions Q 
over F 2 , denoted by W 2 (z). The elements of this field are of the form P(z)/Q(z), where both P(z) and 
Q(z) are binary polynomials. Linear codes over this field have been well-studied in the convolutional 
coding literature tfTTl . 

As in classical distributed network codes 0], the codes in this work are distributed, i.e., the choice of a 
value for ^ u ,i,v,j,w{ z ) at node v can depend only on its local parameters (u, i, v, j, w), and the corresponding 
parameters of the nodes upstream of node v. Since we consider only directed acyclic networks in this 
work, this imposes a significant design constraint, since nodes that cannot directly communicate with each 
other over the network cannot coordinate their coding choices. 

One idea of Q~) that we too use is the idea of having "short headers" in each transmitted packet. 
Specifically, each packet (containing n bits) transmitted by the source, also contains the linear transfor- 
mation induced by the network from the source to that packet - as in (fTJ these transforms are computed 
in a distributed manner and percolated down the network along with the payload information at an 
asymptotically negligible rate. For every t E T, let T t be the network transfer matrix from s to t - these 

4 Terms a(i) and b(j) are respectively set to zero for i ^ {1,2, ... ,n} and j £ {1,2,..., n'}. 
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too can be computed in a distributed manner. Let T be the overall network transfer matrix from s to T 
formed as il tg7 -T t . Let \T t \ and \T\ denote the determinants of T t and T respectively. 

Our codes are either probabilistic or deterministic depending on whether local coding coefficients are 
chosen probabilistically or deterministically. The error probability is the probability over choices of local 
coding coefficients that for each source message, at least one sink's reconstruction of at least one possible 
message from the source is inaccurate. For linear network codes this happens if and only if the transfer 
matrix from the source to each sink is invertible. Rate R is said to be achievable if for any e > and 
5 > there exists a coding scheme of block length n with rate > R — 5 and error probability < e. In 
particular, we require our deterministic network codes to be zero-error, i.e., to have error probability be 
zero. 

III. The Generalized Schwartz-Zippel Lemma 

The classical Schwartz-Zippel lemma [|2l provides an upper bound on the probability that when variables 
of a polynomial are chosen uniformly at random from a field, then the polynomial evaluates to zero. 

Recall that the degree di of a variable x\ in a polynomial P(x\,x 2 , . . . ,xn) is the maximal exponents 
of Xi in its non-zero terms. Further, recall that the degree d of a polynomial P(xi,x 2 , . . . ,xn) itself is 
the maximal value among the sum of the exponents of all its non-zero terms. Note that d <^di. 

Lemma 1 (Schwartz-Zippel lemma [|2]|). Let P(xi, x 2 , ■ ■ ■ , xn) be a non-zero polynomial of degree d > 
over a field F. Let S be a finite subset ofF, and the value of each xi, x 2 , ■ ■ ■ , xn be selected independently 
and uniformly at random from S. Then the probability that the polynomial equals zero is at most d/\S\. 

The Schwartz-Zippel lemma is a useful tool in the analysis of random linear network codes (for in- 
stance [1]). A random linear network code causes an error if and only if one of the transfer matrices from 
the source to the destination is singular. This in turn happens if and only if the product of the determinants 
of these transfer matrices equals zero. But this product of determinants may be viewed as a polynomial 
whose variables consist of the local coding coefficients at each node. Hence the Schwartz-Zippel lemma 
provides an upper bound on the probability of error of a random linear network code. 

In this work we are interested in a generalization of the Schwartz-Zippel lemma, for polynomials whose 
variables are chosen from different subsets of F. We prove: 

Lemma 2. Let P(xi, x 2 , ■ ■ ■ , x^) be a non-zero polynomial over a field F. For all i G {1, 2, . . . , N}, let 
Si be a finite subset ofF, the degree of x^ in P(xi,x 2 , ■ ■ ■ , a; at) be d it and the value of each Xi be selected 
independently and uniformly at random from Si. Then the probability that the polynomial equals zero is 
at most J2iLi (di/\Si\) . 

Proof: Given in Appendix A] □ 
Note 1: Neither Lemma [T nor Lemma [2] put any restriction on the size of the field F, as long as the 
appropriate subsets from which variables chosen are finite. 

Note 2: A related but inequivalent generalization of the Schwartz-Zippel lemma was proved in lfT8l . 
The utility of this proof is that it allows for the variables comprising the polynomial to be chosen non- 
uniformly. This is integral to the proof techniques in this work, wherein we choose local coding coefficients 
over progressively larger sets, depending on how far from the source they are. 

IV. Probabilistic designs 

In this section we describe two probabilistic designs of universal distributed robust network codes. In 
particular, given any e > 0, we present schemes such that the overall error probability of the code is at 
most e. 

Our first scheme is independent of the size (number of nodes/edges) of the network, but does requires 
that the source has a priori knowledge of the number of sinks it shall be required to service. Hence we 
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say it is only weakly universal. Our purpose in presenting this scheme is primarily expository, since the 
proof is significantly easier than that of the second scheme - it helps set the stage for the second scheme. 

The second scheme is strongly universal and is independent of all network parameters, including the 
number of sinks. 

We first describe some useful preprocessing steps relevant for both of our schemes. 
A. Graph transformation 

We find it desirable to work over a transformed graph (V',£') rather than the original graph (V,£). 
This transformation can be done locally at each node, and results in a graph with some useful properties. 
In particular, we use the work of [7 J which demonstrates the equivalence between general network coding 
problems and those over "low-degree" networks where each node has degree at most three. In particular, 
nodes in the reduced network either have one incident incoming edge and at most two incident outgoing 
edges (in which case they broadcast the incoming information on incident outgoing edges, and hence are 
called broadcasting nodes). Otherwise, they have two incident incoming edges and one incident outgoing 
edge (in which case they code the information on the incident incoming edges to generate information 
transmitted on the incident outgoing edge, and hence are called coding nodes). (See Figure |3(b)| ) This 
equivalence is useful for our probabilistic algorithms since it allows us to effectively enumerate networks. 
We change the equivalence relationship of Q slightly as described below so as to make it robust to nodes 
joining the original network. That is, in our equivalence relationship, nodes can join the original network 
while only locally perturbing the "low-degree" network. 

The transformation is as follows. For every node v E V we construct a virtual robust gadget G(v) (see 





Fig. 1. A node v with three incoming and three outgoing links, and its corresponding virtual robust gadget. 




Fig. 2. Modification to the robust virtual gadget G(v) of node v when a new node w connects to v. 



5 We thank Michael Langberg for providing the template for Figures 



[l]and[2] 
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Suppose v has di n {y) incoming links and d out {y) outgoing links. Corresponding each incoming link 
we construct a binary tree whose root is connected to that incoming link, and which has d out {v) + 1 
leaves. Similarly, corresponding to each outgoing link we construct an inverted binary tree whose root is 
connected to that outgoing link, and which has d in (v) + 1 leaves. The last leaf node of each binary tree is 
called a virtual node, and the other leaf nodes are called connection nodes. We then connect connection 
nodes so that there is exactly one path from each incoming link to each outgoing link (the connection 
order does not matter). 

If a new linl^is created in the network - say, a link directed from w to v (see Figure [2] for an example). 
In this case we first create a virtual gadget corresponding to the directed edge (w,v). We then split each 
virtual node on the inverted binary tree (corresponding to the outgoing links of v ) into two by appending a 
binary tree of depth one to it. We denote the second of the two new leaf nodes as a new virtual node, and 
the first as a new connection node. The connection nodes on (w,v)'s virtual gadget are then connected 
to the new connection nodes on each of the outgoing links' virtual gadgets so that there is exactly one 
path from (w, v) to each link outgoing from v. A corresponding (but inverted) procedure holds if the 
new link corresponded to a link outgoing from v. The removal of a link simply corresponds to removal 
of the corresponding virtual gadgets on the incoming and outgoing sides, and all links connected to it. 
The virtual nodes in each virtual gadget are what give our transformation robustness, since in case a new 
node joins or leaves the network, nodes other than the ones directly connecting with the changing node 
experience no structural changes in their existing virtual gadgets^} 

Henceforth, all algorithms in Section IV shall convert the original graph to the virtual graph above 
as a pre-processing step, and all computations shall be over this virtual graph. Also, as part of normal 
communication each node v in the virtual graph (V',£') estimates its depth A(t>), i.e., the length of the 
shortest path from the source to itself. This can be done by any of a variety of distributed shortest-path 
algorithms over acyclic graphs, such as the Bellman-Ford algorithm |fl9ll . 





(a) Coding node. (b) Broadcasting 
node. 



Fig. 3. Coding operations defined on the nodes of Q' . 



B. Weakly universal design 

The essential idea behind our first scheme is as follows. Each node having estimated its depth, it then 
chooses a subset of F 2 (^) whose size scales exponentially in this depth from which it pick its coding 
coefficients uniformly at random. We then show that the probability of error due to information being lost 
at any depth decays geometrically in the depth, and hence by the union bound the overall probability of 
error can be controlled so as not to exceed any desired e. 
WUP(e, |T1) (Weakly Universal Probabilistic) Code: 

6 A new node is treated as the corresponding set of new links created in the network. Similarly, a node's departure is treated as the set of 
corresponding links being removed. 

7 The addition of virtual nodes and the corresponding robust connection procedure is the only substantive difference between our construction 
and that in (7J 
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• Each coding node v in the vertex-set V of the virtual graph chooses two local coding coefficients 
corresponding to the two incoming links uniformly at random from the set of polynomials of degree 
at most 2A(v) + 1 + log(i2|T|/e)f| 

Theorem 1. For any e > 0, WUP(e, \T\) has error probability at most e. 

Proof: Recall that \T t \ represents the determinant of the transfer matrix from the source to sink t. As noted 
in Q, the network code is error-free if and only if the polynomial n t |T t | comprising of the product of 
the \T t \ determinants over all sinks (with the network's local coding coefficients ^ u ,i,v,j,w{ z ) as variables) 
is non-zero. To evaluate the probability that this is the case given the random assignment of local coding 
coefficients in WUP(e, |T|) , we use Lemma [2j Specifically, Each variable Xi in Lemma [2] corresponds to 
a local coding coefficient. We group the coding coefficients @ u ,i,v,j,w{ z ) m terms of the depths A(v) of the 
nodes at which they are used. But there are at most 2 A R coding nodes at any depth A in the virtual graph, 



since after the transformation in Section IV-A the fast possible growth-rate for the new graph would be 
if it corresponded to R parallel binary trees - one for each of the source's messages. Hence there are 
at most 2R2 A local coding coefficients at that depth. Also, Corollary 1 in [1] shows that the degree of 
each local coding coefficient in ILjT t | is bounded from above by |T|. By construction in WUP(e, |T|) 
each virtual node in the network chooses local coding coefficients uniformly at random from the set of 
polynomials of degree at most 2A(t> ) + 1 + log(i?|T|/e). This set is a subset of ¥ 2 (z) of size at most 
2 2/?AH+i+iog(|T|A) = 2R\T\2^ A( -^/e. Summing over all possible local coding coefficients at all possible 
(possibly infinite) depths and substituting the appropriate parameters into Lemma [2| the error probability 
of the network code is bounded from above by 

Pr(ni T *l) = °) < f>^ A |T| ; 



/ l2R \ r \ 22A 

oo 
A=l Z 



The computational complexity of WUP(e, |T|) codes is polynomial in network parameters and log(l/e), 
and the achievable rates approach the network capacity C asymptotically in the block-length. Further, our 
codes are robust to links joining and leaving. Since the analysis of these properties is very similar to that 



of the codes in Section IV-C we delay discussion to the end of that section. 



C. Strongly universal design 

We now present design of probabilistic robust linear network codes that are strongly universal, i.e., 
independent of all network parameters. This obviates the requirement of knowledge of |T| of the codes 



in Section IV-B The idea underlying the construction in this section is as follows. For the purpose of 
analysis, for each sink we identify a set of edge-disjoint paths, and estimate the probability that the 
information on these edge-disjoint paths remains invertible as information flows through the network. In 
particular, for any sink t and any depth A in the network we identify the set of edges in these edge-disjoint 
paths that must contain linearly independent combinations of the source's information. We call such sets 
of edges flow-cuts. It turns out that the number of flow-cuts at any depth is in fact independent of the 
number of sinks, and further, a bound on this number at each depth can be computed locally. Thus sinks 
can be classified according flow-cuts. Hence, instead of trying to ensure that the linear transform to each 
sink is invertible as in Theorem [T] nodes at each depth simply try to ensure that the linear transform to 
each flow-cut is invertible. To analyze the probability of non-invertibility at each flow-set an alternative 
to the end-to-end analysis of the probability of error used in flTB, [J8]] is required. Here we use the proof 

8 This choice of the degree bound is simply to ease the analysis of Theorem[T| All logarithms are binary. Also, for simplicity of presentation 
we assume that log(i?jT|/e) is an integer - if not, we may round up to the nearest integer with negligible error in our estimate of parameters. 
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technique of [20J, which analyzes the probability that information gets lost from one set of edges in the 
network to a neighbouring set of edges. 

SUP(e) (Strongly Universal Probabilistic) Code 

• Each coding node v at depth A(v) in the vertex-set V of the virtual graph chooses two local coding 
coefficients corresponding to the two incoming links uniformly at random from the set of polynomials 
of degree at most (R + l)(A(v) + 1 + logi2) + A(v) + log(l/e) - 1. 




J r (t ] )={e 1 e 2 , e s e & e 6 e 8 } 

(a) Butterfly network, (b) One possible choice of a flow-set 
J-(ti) for the butterfly network. 

Fig. 4. Illustration of the definition of a flow-cut and a flow-set based on the butterfly network topology. The flow-set IF(ti) comprises of 
the successive flow-sets (eijCz), (ei,e4), (es,e4), (es,ee), (eg, e^). 



Recall that by assumption the capacity of the network is at least R. Hence there is a set V of at least R 
edge-disjoint paths that go from the source s to each t. 

Corresponding to each such set V of edge-disjoint paths, we define flow-cuts. A flow-cut F(t) is defined 
as a set of R edges that have the property that each edge in the flow-cut is from a distinct edge-disjoint path 
in V(t). These flow-cuts are useful since we intend to analyze the linear (in)dependence of information 
flowing through the edges in each flow-cut - if the information on each edge in a flow-cut is linearly 
independent, then the source's information can be retrieved from that flow-cut. Hence, we only need 
to inductively prove that no information is lost from one flow-cut to the "next" flow-cut, appropriately 
defined as below^ 

We define the depth A(F t ) of a flow-cut F* as the maximum depth of the head of any edge in it, i.e., 
A(F(t)) = max A(eA Further, we denote a flow-cut of depth A by F(t,A). 

head(e)&F(t) 

We then define a flow-set J-"(t) as an ordered set of flow-cuts with some properties. In particular, each 
flow-cut in a flow-set differs from the successive flow-cut in exactly one edge. Specifically, if one flow-cut 
in Fit) differs from the next flow-cut in Fit) in that some edge e is replaced by another e', then it must 
be the case that e is the edge preceding e' in some path in the set of edge-disjoint paths V(t). Intuitively, 
each flow-set ^(t) captures successive snapshots of how information flow from the source to the sink t. 



Examples of flow-cuts and flow-sets are provided in Figure 4(b) based on the butterfly network in 



Figure 4(a) 

Let F(t, A) be some flow-cut of depth A to sink t, and F\t) be the flow-cut immediately preceding^"] 
F(t, A) in flow-set ^(t). Let T(F(t, A)) be the linear transform that the network imposes from the source 
s to the edges in the flow-cut F(t,A), and let p(A) be the rank of this transform. Correspondingly, let 
T(F'(t)) be the linear transform from s to F'(t), and let p' be the rank of this transform. Then the following 
lemma gives an upper bound on the probability that choosing local coding coefficients according to the 
dictates of WUP(e, |T|) results in a loss of information in going from F'(t) to F(t, A). 

Theorem 2. For every e > 0, SUP(e) has error probability at most e. 

'Similar intuition was used in the proofs of |6| and [20 1, where they were called "frontier edge-sets".Note that a flow-cut need not be a 
cut or a subset of it - for instance, it may include two edges on two edge-disjoint paths, such that one is incoming to a node, and the other 
is outgoing from it. 

l0 Note that the depth of F'(t) might be either A or A — 1, since two successive flow-cuts differ in exactly one edge, which may or may 
not be the deepest edge in a flow-cut (if not, then both flow-cuts have the same depth; if so, the depth of the flow-cut can change by at most 
one). 
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Proof: Note that of the two types of nodes in the virtual graph, the broadcasting nodes induce no additional 
error - if a flow-cut contains R linearly independent packets, and one of the edges in the flow-cut is 
replaced with another edge at a broadcasting node, the information in the succeeding flow-cut remains 
unchanged. Thus from now on we focus only on coding nodes. 

By construction the structure of the virtual graph (V', £') is such that each node can have at most two 
outgoing edges, and further the source node is replicated R times. Hence the maximum possible number 
of edges in the virtual graph up to a depth A occurs when it comprises of R parallel binary trees. But 
each binary tree has at most 2(2 A ) edges, hence the total number of edges in the virtual graph up to 
depth A is at most 2R(2 A ). Also, the total number of flow-sets of depth A is at most ( 2R< £ ■*), which is 
bounded from abovd^jby exp(R(A + 1 + logi?)). 

We use these bounds to bound from above the number of distinct type of coding choices a coding 
node at a certain depth faces. All our analysis now focuses on the specific following coding node v. 
Let its incoming edges be e' and e", and the outgoing edge be e. Let edge e' belong to a flow-cut 
F'{t) in flow-set going towards sink t, and edge e" be an arbitrary other edge. Then the outgoing 
edge e replaces e' in the flow-cut F'{t) to produce flow-cut F{t). Suppose F(t) is of depth A. Then 
by the bounds in the preceding paragraph, the number of ways an arbitrary flow-cut of depth A can 
result from the merger of a preceding flow-cut and an arbitrary edge of depth at most A is at most 
2R(exp(A)) x exp(R(A + 1 + logi?)), which equals 

exp((R + l)(A+l + \ogR)). (1) 

Next, we estimate the probability that a coding node "loses information". That is, we bound from above 
the probability that the number of linearly independent packets on the edges of a flow-cut is less than R 
even though the immediately preceding flow-cut has R linearly independent packets. 

Say T(F(t, A)) represents the Rx R matrix whose ith row represents the linear combinations of the 
source's R messages on the ith link in the flow-cut F(t, A) of depth A. Correspondingly, let T(F'(t)) 
represent the matrix representing the linear transform from the source to the flow-cut F'(t) immediately 
preceding F(t,A) in the flow-set J 7 ^), and suppose it is of full rank R. Then the message Y e (z) on 
edge e in flow-cut F(t, A) may be written as /3 e / e (z)F e /(z) + {3 e >r je (z)Y e »(z). (Recall that e' and e" are the 
edges incoming to v, Y e i(z) and Y e i/(z) are the corresponding messages carried by them, and (3 e / ;e ( z ) an d 
(3 e i^ e ( z ) represent the local coding coefficients at node v.) But by assumption T(F'(t)) is of full-rank, 
and hence the message Y e n(z) may be written as a linear combination of the messages on the edges in 
flow-set ^'(t). Thus the message on edge e may be written as 

/3 e ',e{ z ) Y e'{ z ) + ^2 Mi)A Z ) Y e"( z )^ 

for some r y e (i),e( z ) £ F 2 (z). This in turn equals 

(/3 e ',e0) +"f e ", e {z))Y e >(z) + ^ 1e(i)A Z ) Y e"( z )- 

e(i)6F'(i):e(i)^e' 

But the information on the links in ^(t, A) other than e is unchanged, and hence the only manner in 
which the messages on the edges in ^(t, A) are linearly dependent is if the coefficient ((3 e i j£ ( z ) +7e",e( z )) 
equals zero. But by the choice specified in SUP(e) the coding coefficients are chosen from the set of 
polynomials of degree at most (— [(R + 1)(A + 1 + logi?) + A + log(l/e — 1)]) - this set is of size 
exp (- [(R + 1)(A + 1 + log R) + A + log(l/e)]). Lemma [2] then implies that the probability that the de- 
gree 1 polynomial (/3 e ',e( z )+le",e( z )) equals zero, is at most exp (— [(R + 1)(A + 1 + logi?) + A + log(l/e)]). 
Analogously to Theorem [Tj taking the union bound over all possible coding operations at depth A (for 

"Since (^) < a b . Also, all exponents exp are base 2. 
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which ([T]) is an upper bound), and summing over all (possibly infinite) depths A gives us that the overall 
probability of error is at most 

oo 

2(R+l){A+l+logR) 



A=l 

oo 



2 A+([(i?+l)(A+l+log R)]) 



A=l Z 



ZX Robustness 



Due to the robust graph transformation described in Section IV-A neither the addition nor the deletion 
of edges or nodes in the network causes problems with our proof. If new nodes are added to the network, 
the actual depth of some nodes in the network, say v in particular, may decrease. However, we require each 
node v to use in perpetuity the value of the depth A(t>) it estimates in the first round of communication. 
This ensures that the bound on the number of coding nodes at a particular depth A is not violated. 
Conversely, if nodes leave the network, the actual depth of v may increase. Nonetheless, the bound on 
the number of coding nodes at a particular depth A is still not violated, since the total number of nodes 
in a network has only decreased. 



E. Complexity analysis 

The complexity of both WUP(e, |T|) and SUP(e) scale with the corresponding degree of the polynomials 
chosen by nodes as coding coefficients. 

For WUP(e, |T|) , by construction, the degree of the polynomials used as coding coefficients (as noted 
in [I2TTI is a good proxy for the implementation complexity of codes) scales as the maximum depth of 
the virtual graph plus 0(log(i?|T|/e)). But in the virtual graph each original node is replaced with a 
gadget with depth that is approximately the logarithm of the degree of the node, which in the worst 
case is at mos{|jc|V|. Hence the maximum depth of the virtual graph is at most C(log(c|V|)) times the 
number of vertices in the original graph. Pulling it all together, the complexity of implementation scales 
asO(|V|log(c|V|)+log(i2|7l/e)). 

The corresponding redundancy the network introduces in the codes, arising from the delays introduced 
by each coding node, then scales at worst as the maximum depth of the virtual graph times the maximum 
degree of the polynomials used by any node, since each coding node in a path can introduce at most the 
maximal delay and delays along a path add up. Using the bounds above, this scales as (9(|V| 2 log 2 (c|V|) + 

|V|log(c|V|)log(i2|T|/e)). 

A similar analysis shows that the complexity of implementation of SUP(e) scales as C(i?|V| log 2 (c| V|) + 
|V| log 2 (c|V|) log(l/e)}), and that the redundancy introduced by such codes scales as C((-R|V| 3 log 3 (c|V|) + 
|V| 2 log 3 (c|V|)log(l/e)})). 

We now demonstrate that the implementation complexity of WUP(e, |T|) and SUP(e) is in fact order- 
optimal by demonstrating a class of networks for why any universal design requires computational 
complexity which is similar in order of magnitude to that of WUP(e, |T|) and SUP(e) . Our construction 
is inspired by that of |fT0|. Consider any universal e-error linear network code, i.e., any network code that 
requires that each sink be able to reconstruct the source's information with probability of error at most e, 
even if the network topology is not known in advance. 

Theorem 3. There exists a class of networks for which the implementation complexity for any t-error 
universal network code is f2(|£| — log(l/e)). 

Proof: We construct a single-source multicast network that requires that for any universal code, coding 

l2 Recall that c denotes the maximum value for any edge's capacity in the network. 
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Fig. 5. Example demonstrating that the implementation complexities of WUP(e, \T\) and SUP(e) are order-optimal. 



operations must be chosen from a set that must be exponentially larger than would actually be needed if the 



topology were known in advance. Consider the "binary-tree-like" network demonstrated in Figure IV-E 
The upper part of this graph comprises of a binary tree of depth A, with the source s located at the root 
of the tree, and hence 2 A leaf nodes. Each link of this binary tree has capacity 2. Next, each leaf node 
of this binary tree has a link of unit capacity leaving it to a corresponding forwarding node. Finally, we 
consider two possibilities. Either there are ( 2 J sinks, such that each sink is connected to a distinct subset 
of size two of the set of 2 A forwarding nodes, via unit capacity links to each of the two nodes in the 
subset (this is in the spirit of the combination networks examined in ifTOlO . Or, there is only one sink node 
in the network, which is connected to two of these forwarding nodes (say Vi and Vj) via links of unit 
capacity each. 

As to coding strategies, each node in the binary tree part of the network can forward two linearly 
independent messages X\(z) and X 2 (z) on each of its outgoing links. Hence at depth A each of the 2 A 
leaves of the binary tree have both Xi(z) and X 2 (z). Since neither of the two leaf node corresponding 
to Vi or Vj (say we call them Ui and uj respectively) is aware of which of the two configurations the 
network is in, to be universal they must use coding operations that work for both configurations. 

First we consider the case of e = 0, i.e., every message must be decoded correctly. Suppose the leaf 
nodes Ui and Uj choose to transmit the linear combinations A 1 {z)Xi{z) + Bi(z)X 2 (z) and A 2 (z)Xi(z) + 
B 2 (z)X 2 (z), or equivalently X 1 (z)+ai(z)X 2 (z) and Xi(z) + a 2 (z)X 2 (z) (by setting aii(z) =B i (z)/A i (z) 
for i = 1,2). But the messages on each of the forwarding links must be linearly independent (to take 

2 J sinks). Hence there 

must be at least 2 A — 1 choices for the ati(z)s (one for each of the leaf nodes, minus one for the case 
when Ai(z) = 0, which can be handled separately.) But if the set of possible coding operations is f2(2 A ), 
then its implementation complexity must be at least f2(A). But this is fi(|£|). In contrast, if the network 
happened to be in the configuration with only one sink and this was known in advance, then each of u% 
and Uj could simply forward one bit, for an implementation complexity of 1. 

The case with e-errors can be similarly analyzed, by allowing sinks to make errors a fraction e of the 
time. A direct counting argument gives the required result. □ 



V. Deterministic designs 

In this section we describe two deterministic designs of universal distributed robust network codes that 
are zero-erroiPl 

Our first scheme is only for codes of rate 2. It is related to a construction of lfl4l . but generalizes it so 
that the choice of coding operations is independent of the size of the network. We call our scheme the 

''Preliminary versions of the proofs in this section were in the thesis 1151 
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Rate 2 Deterministic Design R2-D 2 for short. Our purpose in presenting this first scheme is primarily 
expository, since the proof is significantly easier than that of the second scheme - it helps set the stage 
for the second scheme. 

Our second scheme is for general rates and is independent of all network parameters, including the 
number of sinks. We call this scheme the Capacity 3 or more, Probability of error scheme, or C3-P0 
for short. However, C3-P0 is more of an existence result than a practical code since the computational 
complexity of its implementation is exponential in network parameters. 

We first describe some useful preprocessing steps relevant for both of our schemes. 



A. Robust distributed unique ID assignment 



While the codes in Section IV only required nodes to estimated their depth, the zero-error codes in 
presented in this section require nodes to obtain a unique ID, i.e., an ID that is distinct for each node 
in the network. Such an ID allows nodes to loosely coordinate coding choices even if they are unable 
to communicate directly with each other, and thereby ensure that the overall code is "good". Such IDs 
might be pre-assigned to nodes (for example via factory stamps, or GPS coordinates, or IP addresses), or 
be assigned on the fly, as described below. 

The task of distributing unique IDs to nodes over a directed graph was considered in [|22|. The essential 
idea of their algorithm is to pretend that the graph is a tree directed from the root to the leaves (if not, 
extra edges are removed for the ID assignment protocol), and to assign IDs so that the binary expansion of 
each node's ID is a prefix to the binary expansion of all nodes downstream from it. This ID distribution 
can be carried out with communication cost that is asymptotically negligible in the packet length, in 
conjunction with the normal flow of information through the network, for instance in the header. Here, 
as in Section |IV-A[ we need to change the unique ID distribution protocol slightly to make it robust 
to network changes, so that new nodes are still ensured that IDs assigned to them do not clash with 
previously assigned IDs. In the same spirit as the robust virtual gadgets in Section IV-A at each node v 
we reserve a virtual ID for the event that a new node might in the future connect to v; if so, this virtual 
ED is again split into another virtual ID, and an ID that is assigned to the new node. As noted in [|22l 
the worst-case growth rate of the largest node ID with the network size is exponential in |V|, for reasons 
similar to those outlined in Theorem [3]- nodes might be unable to distinguish between a full binary tree, 
and a very sparse graph. 



B. Cantor labeling 

The well-known Cantor diagonal argument [23] makes an unexpected cameo in this work. One version 
states that the cardinality of the set of integers is the same as that of the set of finite dimensional vectors 
with integer components, and further gives an effective bijection between the two sets. Further, this 
bijection guarantees that any vector in Z k with maximum component I is mapped to an integer of size 
0{l k ). This mapping is useful since, given a unique ID for each node v, we then need to produce unique 
coding coefficients for each pair of edges such that one is incoming to v and the other is outgoing for v. 
Prior to code design, the number of such coefficients that each node might need to choose is unknown. 
However, each u> i >v> j iW (z) coefficient can be labeled by at most the five indices (u,i,v,j,w), each of 
which is an integer. Hence given a node's unique ID, one can produce unique integral labels for each 
vector (u, i, v,j, w) that are not too much larger (at most the fifth power) than any of the five parameters 
in (u,i,v, j,w). This mapping, denoted K{.), can then be used to select distinct local coding coefficients 
as needed in Sections IV-CI and IV-DI 



C. Rate 2 zero-error codes 

For the case when the transmission rate equals 2, note that there are essentially just two non-trivial 
scenarios for each node - either a node receives one linearly independent message on incoming links, 
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Fig. 6. Example showing the mapping between Z and Z 2 . 



or it receives two. In the former case, it can only broadcast incoming information on outgoing links. 
In the latter case, it can reconstruct the source's information, and thereby can fully control the linear 
combinations on outgoing links. Our construction for R2-D 2 rests on analysis of these cases. 
R2-D 2 (Rate 2 Deterministic Design) 

• The source s has two linearly independent messages X\{z) and X 2 (z)). 

• Depending on its connectivity to the source, on incoming edges each node v £ V receives either one 
or two linearly independent combinations of the source messages (Xi(z), X 2 (z)). 

. If a node v receives only one linearly independent message on incoming links, it broadcasts it down 
all outgoing edges. 

• If a node v receives two linearly independent combinations of (Xi(z),X 2 (z)), this enables it to 
reconstruct both X\(z) and X 2 (z). For each jth directed edge connecting each pair of nodes v, w 
(connected possibly by multiple parallel edges), we use the Cantor labeling algorithm^ in Section V-B 



to assign a distinct local coding coefficient. In particular, let K(v,j,w) denote the 3-dimensional 
Cantor mapping. Then the node v then transmits Xi(z) + (^K{v,j,w){ z )X 2 {z) down the jth edge 
connecting v to w (here /3K( v ,j,w)(z) is chosen to be distinct for each K(v,j,w)). 

Theorem 4. For any network Q with min-cut capacity at least 2, R2-D 2 succeeds with zero error. 

Proof: For any v E V such that the mincut between the source and v is at least 2, there are at least two 
edge-disjoint paths from the source to v. By the statement of our R2-D 2 algorithm, for any such nodes 
v and v', the linear combinations of X\(z) and X 2 (z) on all their outgoing links must be distinct, and 
linearly independent (since the vectors (1, ^K(v,j,w){z)) and (1, (3K{v>,j',w')(z)) are linearly independent if 
and only if ^K{v,j,w){z) and /3K(v',j',w')(z) are distinct). □ 

D. General zero-error codes 



The challenge in extending the results of Section |V-C to rates greater than 2 lies in the fact that there 



might be nodes receiving two or more linearly independent pieces of information, and yet are unable to 
decode the source messages. In this case, they do not have full control over the messages they are able 
to send out, and hence the argument of Theorem [4] fails. In this section, we get around this challenge 
by examining a different invariant of linear convolutional network codes. In particular, we choose coding 
coefficients in a distributed manner so that the delay of the source messages on every path in the network is 



l4 In Section V-B we assume that u and i are also variable, but in this section they are fixed. 
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distinct. This means that the source messages never cancel out at the sinks, and hence can be reconstructed. 



2 i ^ (000000100) 
2 j ^ (000010000) 
2 k (010000000) 



(010010100) 



Fig. 7. Graphic representation of the proof of correctness of C3-P0 codes. 



C3-P0 (Capacity 3 or more, Probability of error 0) codes 

• For each 5-tuple (u,i,v, j,w), let K(u,i,v, j,w) be the 5-dimensional Cantor mapping defined in 
Section V-B We define the local coding coefficient p u ,i,v,j,w{z) as 



z 

i.e., the monomial in z with degree exp(K(u,i,v, j,w)) (here the exponent is base 2). 
Theorem 5. For any network Q, C3-P0 succeeds with zero error. 

Proof: Theorem 4 in HI demonstrates that \T t \, the determinant of the transfer matrix T t from the source to 
any sink t, can be written as Y,c-pH-pf3 u> i iV j w (z), where the product is over all the local coding coefficients 
on a particular path V from s to t, c-p is a non-zero constant corresponding to V, and the outer summation 
is over all paths from s to t. Our choice of local coding coefficients along any path in C3-P0 implies 
that | Ti| equals 



5>II 



exp(K (u,i,v ,j ,w)) 



(2) 



But by choice, each of the terms K(u, i, v, j, w) is distinct, and hence the binary expansion of exp(K(u, i, v, j, 
has a single 1 in a distinct location. But if two paths in the summation differ, then they must differ 
in at least one of the local coding coefficients, and therefore the exponent of the power of z along the 
two paths must differ - hence each path corresponds to a distinct power of z. This implies that as long 
as there is at least one path from s to each t E T, each of the corresponding transforms T t must be 
invertible. □ 



E. Complexity Analysis 

The complexity of both R2-D 2 and C3-P0 scale with the corresponding Cantor labeling and node 
assignments. 

For R2-D 2 the size of the set any node chooses its coding coefficients from scales as the third power 
of the largest node ID or the largest link-capacity in the network. But as noted in Section V-B , the largest 
node ID can scale exponentially in |V|. Hence the degree of the polynomials used as coding coefficients 
scales logarithmically in the size of the sets from which local coding coefficients are chosen, which in 
turn scales as 0(max{|V|, log 3 (c)}). The corresponding redundancy the network introduces in the codes, 
arising from the delays introduced by each coding node, then scales as 0{\V\ max{|V|, log 3 (c)}), since 
each coding node in a path can introduce at most the maximal delay and delays along a path add up. 
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A similar analysis shows that the complexity of implementation of C3-P0 scales as (9(exp(rnax{|V|, log 5 ( 
and that the redundancy introduced by such codes scales as (9(|V|exp(max{|V|, log 5 (c)})). 

The problem of polynomial identity testing (PIT) ll24l examines the question of deterministically 
determining whether a polynomial with a succinct but non-standard representation (such as the determinant 
of a matrix of polynomials) identically equals zero. The deterministic complexity of such problems is a 
long-standing open problem in theoretical computer science. Given this context, we are unable to provide 



c)})). 



intuition on whether our codes in Section V-D have order-optimal computational complexity - indeed, 
answering this question in either direction would represent significant progress in resolving the complexity 
of PIT problems. 



VI. Implementation issues 

As noted in li2Tl . the complexity of implementation of network codes scales polynomially in the loga- 
rithm of the field-size over which operations are performed, or in the case of convolutional network codes, 
polynomially in the degree of the polynomials used at each node. By this measure, the implementation 
complexity of the codes in 0]|, [IH1 is poly-logarithmic in network parameters, whereas the implementation 
complexity of the first three of the four codes in this work is polynomial in network parameters. While 
this is an exponential blow-up, we note that the resulting codes are still computationally tractable, and 



further, as noted in Section IV-E t such a blow-up is in fact necessary for codes to be universal. 
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A SUMMARY OF THE PROPERTIES OF OUR CONSTRUCTIONS, AND COMPARISON BETWEEN THEM AND PRIOR NON-UNIVERSAL 

ALGORITHMS. 



While the schemes in this work have been presented in the context of convolutional network coding 
operations at each node, they also go through for other infinite fields such as Q - the only requirement 
is that the field be unbounded in size, and that an infinite subset of it have a succinct representation. 

Also, despite presenting all messages at the source and each link as bit-streams of possibly unbounded 
length, the schemes described in prior sections can also be implemented by packetization, by chopping 
up the bit-streams into packets of a standard size n. 

In our codes the header of each packet contains low-rate control information used by each node to 
decide on its coding operations. However, by design, the size of this header changes as information flows 
down the network - the rate of change depends on the network topology, and hence is unpredictable in 
advance. One challenge in the implementation of our codes is thus to ensure that the intermediate nodes 
are able to distinguish between header information and payload information. One standard trick for such 
scenarios is used in Theorem 14.2.3 of ll25ll - each bit of the header is doubled, and the final such double- 
bit is followed by a 01 to signify the end of the header. Since the length of the header is asymptotically 
negligible in the packet-size, the communication cost of this bit-doubling is still asymptotically negligible. 

VII. Discussion 

In this work we provide the first rate-optimal network code designs that have guaranteed decodabil- 
ity performance, and yet are independent of all network parameters. While requiring such universality 
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makes us pay a price in the computational complexity and redundancy, (all but one of) our codes are 
computationally efficient to implement. The analytical tools we derive may well be of independent interest. 
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Appendix 

A. Proof of Lemma [2] 

We proceed by mathematical induction. In the base case when N = 1, Lemma [2] is equivalent to the 
Schwartz-Zippel lemma in one variable. 

As the inductive hypothesis, suppose that Lemma|2]is true for N — 1 variables in the polynomial P(.). 

Now consider the case when the polynomial P(x 1 ,x 2 , . . . , x N ) has iV variables. The polynomial can 
be rewritten so that 

, X 2 , • • • , Xn) — Xjy Pi(x 1 ,x 2 , . . . ,xat-i) + Ri(xi,x 2 , ...,x N ) 
for some polynomials P\{.) and Ri(-) over the appropriate variables. 
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The probability that P(.) equals zero can be bounded from above by 

Pr[P(.) = 0] = Pr[P(.) = 0,P 1 (.) = 0]+Pr[P(.) = 0,P 1 (.)^0] 
= Pr[P 1 (.)=0]Pr[P(.) = 0|P 1 (.) = 0] 

+ Pr[Px(.)^0] Pr[P(.) = 0|Pi(.) ± 0] 
< Pr[P(.) =0] + Pr[P(.) =0|P(.) ^0] (3) 

But by the inductive hypothesis 

N-l , 

PrfPiQ = 0] < W 

i=i 

Also, by the Principle of Deferred Decisions [26] the probability Pr(P(.) = 0) is unaffected if the 
value of x N is chosen after the values of all the other variables have been fixed. In this case, if Pi(.) ^ 0, 
then P(.) is a polynomial of degree d N over x N . By the Schwartz -Zippel lemma 

Pr[P(.) = 0|P!(.) ± 0] < (5) 

Substituting (|4]) and ([5]) into ([3]) gives the required result. □ 



